Skip to content

[Bugfix] LoRA for DeepSeek V3.2#35077

Merged
jeejeelee merged 18 commits intovllm-project:mainfrom
HollowMan6:fused_qkv_a_proj
Apr 22, 2026
Merged

[Bugfix] LoRA for DeepSeek V3.2#35077
jeejeelee merged 18 commits intovllm-project:mainfrom
HollowMan6:fused_qkv_a_proj

Conversation

@HollowMan6
Copy link
Copy Markdown
Contributor

@HollowMan6 HollowMan6 commented Feb 23, 2026

Purpose

This PR fixes LoRA regressions seen with DeepSeek V3.2/DSA:

  1. LoRA module registration failed for fused_qkv_a_proj with an assertion that the module was not a BaseLayerWithLoRA.
  2. After that fix, MLA weight post-processing failed with AttributeError: 'ColumnParallelLinearWithLoRA' object has no attribute 'quant_method'.
   File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 46, in load_lora_model
     return self.lora_manager.create_lora_manager(model, vllm_config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 227, in create_lora_manager
     lora_manager = create_lora_manager(
                    ^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 895, in create_lora_manager
     lora_manager = lora_manager_cls(
                    ^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 807, in __init__
     super().__init__(
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 111, in __init__
     self._create_lora_modules()
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 407, in _create_lora_modules
     self.register_module(module_name, new_module)
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 414, in register_module
     assert isinstance(module, BaseLayerWithLoRA), (
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 AssertionError: Module model.layers.0.self_attn.fused_qkv_a_proj must be a BaseLayerWithLoRA instance, got <class 'vllm.model_executor.models.deepseek_v2.DeepSeekV2FusedQkvAProj'>
File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 858, in worker_busy_loop
     output = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
   File "/mnt/data/user/songlin/verl/verl/workers/rollout/vllm_rollout/utils.py", line 273, in update_weights_from_ipc
     process_weights_after_loading(model, model_config, self.device)
   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 117, in process_weights_after_loading
     module.process_weights_after_loading(model_config.dtype)
   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/layers/attention/mla_attention.py", line 655, in process_weights_after_loading
     kv_b_proj_weight = get_and_maybe_dequant_weights(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/utils/quant_utils.py", line 333, in get_and_maybe_dequant_weights
     if layer.quant_method is None or isinstance(
        ^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1965, in __getattr__
     raise AttributeError(
 AttributeError: 'ColumnParallelLinearWithLoRA' object has no attribute 'quant_method'
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 771, in worker_main
    worker = WorkerProc(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 597, in __init__
    self.worker.load_model()
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 336, in load_model
    self.model_runner.load_model(load_dummy_weights=dummy_weights)
  File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4222, in load_model
    self.model = self.load_lora_model(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 46, in load_lora_model
    return self.lora_manager.create_lora_manager(model, vllm_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 227, in create_lora_manager
    lora_manager = create_lora_manager(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 895, in create_lora_manager
    lora_manager = lora_manager_cls(
                   ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 807, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 111, in __init__
    self._create_lora_modules()
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 407, in _create_lora_modules
    self.register_module(module_name, new_module)
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 414, in
    assert isinstance(module, BaseLayerWithLoRA), (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Module model.layers.3.mlp.gate must be a BaseLayerWithLoRA instance, got <class 'vllm.model_executor.layers.fused_moe.router.gate_linear.GateLinear'>

Test Plan

Added the unit test cases, and also with end to end test manually.

Test Result

All pass without the above error.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.

Copilot AI review requested due to automatic review settings February 23, 2026 04:28
@dosubot
Copy link
Copy Markdown

dosubot Bot commented Feb 23, 2026

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify mergify Bot added deepseek Related to DeepSeek models bug Something isn't working labels Feb 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses two LoRA regressions related to DeepSeek V3.2/DSA models. The changes primarily involve modifying type checks from type(obj) is Class to isinstance(obj, Class) to correctly handle subclasses, and introducing a mechanism to unwrap LoRA linear wrappers before accessing quantization metadata. The added test cases validate these fixes, ensuring that LoRA modules are registered correctly and weight post-processing functions can access quant_method attributes as expected. The changes are well-targeted and directly resolve the reported issues, improving the robustness of LoRA integration with various model architectures.

Comment thread tests/lora/test_layers.py Outdated
Comment thread tests/lora/test_layers.py Outdated
Comment thread vllm/model_executor/layers/quantization/utils/quant_utils.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes two LoRA regressions encountered with DeepSeek V3.2/DSA that prevented LoRA adapters from being applied to the custom DeepSeekV2FusedQkvAProj layer, which is a subclass of MergedColumnParallelLinear.

Changes:

  • Modified LoRA layer replacement logic to support subclasses of MergedColumnParallelLinear by changing type() is checks to isinstance() checks
  • Added unwrapping logic in get_and_maybe_dequant_weights() to handle LoRA wrappers transparently by accessing the underlying base_layer
  • Added comprehensive test coverage for both fixes

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
vllm/model_executor/layers/quantization/utils/quant_utils.py Adds automatic unwrapping of LoRA wrappers in get_and_maybe_dequant_weights() to access the base layer's quantization metadata
vllm/lora/layers/column_parallel_linear.py Changes type checks from type() is to isinstance() for MergedColumnParallelLinear to support custom subclasses like DeepSeekV2FusedQkvAProj
tests/lora/test_layers.py Adds test cases for subclassed MergedColumnParallelLinear layer replacement and for get_and_maybe_dequant_weights() with LoRA wrappers

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@HollowMan6 HollowMan6 force-pushed the fused_qkv_a_proj branch 4 times, most recently from 70ae7a3 to 0b1d296 Compare March 3, 2026 21:17
@HollowMan6 HollowMan6 force-pushed the fused_qkv_a_proj branch 4 times, most recently from 37a5cc3 to 8430f84 Compare March 14, 2026 14:05
@HollowMan6 HollowMan6 force-pushed the fused_qkv_a_proj branch 2 times, most recently from 0e2c03e to fe5f7d2 Compare March 16, 2026 20:17
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 17, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @HollowMan6.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 17, 2026

Hi @HollowMan6, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@HollowMan6 HollowMan6 force-pushed the fused_qkv_a_proj branch 2 times, most recently from dd3c039 to 162299b Compare March 18, 2026 15:19
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Copy link
Copy Markdown
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM except the final comment.

@@ -202,6 +204,12 @@ def all_gather(self, input_: torch.Tensor, dim: int = -1) -> torch.Tensor:
+ (self.world_size * input_size[dim],)
+ input_size[dim + 1 :]
)
# When the gathered dimension has size 1, torch.compile can preserve a
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move these changes to lora/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I just found that this change is actually not necessary after the other fix, so I removed the related changes here.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
@HollowMan6 HollowMan6 requested a review from jeejeelee April 20, 2026 06:44
Copy link
Copy Markdown
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

@jeejeelee jeejeelee enabled auto-merge (squash) April 20, 2026 07:39
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 20, 2026
auto-merge was automatically disabled April 20, 2026 12:07

Head branch was pushed to by a user without write access

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 20, 2026

Hi @HollowMan6, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Hollow Man <hollowman@opensuse.org>
@jeejeelee jeejeelee merged commit a250f1b into vllm-project:main Apr 22, 2026
81 checks passed
@HollowMan6 HollowMan6 deleted the fused_qkv_a_proj branch April 22, 2026 11:35
Comment on lines 1622 to +1629
finally:
# Note: for some reason DeepEP buffers don't seem to be
# entirely reusable on B200. In order to work around this
# we clear the all2all manager's cache after each testpoint.
cap = current_platform.get_device_capability()
if (
cap is not None
and cap.major == 10
and (
test_config.backend == "deepep_low_latency"
or test_config.backend == "deepep_high_throughput"
)
):
# DeepEP managers are not reliably reusable across many subtests in
# a single worker process. Tear them down after each DeepEP case so
# later subtests do not inherit stale communication state.
if test_config.backend in {
"deepep_low_latency",
"deepep_high_throughput",
}:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this breaks the CI #40637

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think so as all the CIs were passed before merge, including the specific one you mentioned: https://buildkite.com/vllm/ci/builds/62466/steps/canvas?sid=019db40c-c48c-4238-b1ca-827533eb7d09&tab=output

Also d22887b was introduced specifically for fixing that CI.

Copy link
Copy Markdown
Contributor Author

@HollowMan6 HollowMan6 Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh make sense, the nightly build happens at 2am

baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Apr 23, 2026
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yifan <yzong@redhat.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Adrian <info@zzit.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants